We have decide to build on the flight data from OpenFlights.org that we used in our first project.
OpenFlights consists of the following datasets:
We have choosen to use the Routes and Airports datasets to build our network. We will use Airline ID and Source Airport ID from the Routes dataset as our two nodes. Each route record will be an edge between the two ntypes of nodes and we will add weighting by the number of routes that each airine has leaving from each particular airport. We will get additional airport node attribute data from the Airports dataset.
Variables used from the Routes dataset:
Variables used from the Airports dataset:
Data is saved as .DAT files. They are UTF-8 encoded.
First import necessary packages for plotting graphs using NetworkX and Matplotlib and set up graph size parameters...
import networkx as nx
from networkx.algorithms import bipartite as bi
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math
%matplotlib inline
Read in Routes data to a Pandas dataframe. We only need the Airline and Airport columns so we selected for just those two.
routes_raw = pd.read_csv("Project1_Data/routes.dat",
header=None, sep=",",
usecols=[2, 4],
names=['Airline', 'Airport'])
routes_raw=routes_raw.dropna()
routes_raw.head(15)
We chose to count the number of routes for each eirline at each airport and use that as a weight for our airline to airport edges.
aa = routes_raw.copy()
aa['weight'] = aa.groupby('Airline')['Airport'].transform('count')
aa.head(10)
len(aa)
airports_raw = pd.read_csv("Project1_Data/airports.dat",
header=None, sep=",",
usecols=[1, 2, 3, 4, 6, 7],
names=['Airport Name','City','Country','IATA', 'Latitude', 'Longitude'])
#airports_raw1=airports_raw[airports_raw.Country=='France']
airports_raw.head()
len(airports_raw)
aa_merged = airports_raw.copy()
aa_merged = pd.merge(aa,aa_merged,left_on='Airport',right_on='IATA',how='inner')
aa_merged.head()
airport_nodes = aa_merged[['Airport','Airport Name','City','Country','IATA','Latitude','Longitude']].drop_duplicates().sort_values(by='Airport')
Through trial and error we discovered that the same three letter codes are used for both airlines and airports, so we have to add differentiation to the names before adding these airport atributes to the graph or all our airlines will also get coded as airports... To keep things short and so we can still read out graph once it's plotted, we added just an L for airLines and a P for AirPorts
airport_nodes.loc[airport_nodes['Airport'].isin(['AER', 'ASF', 'CEK', 'DME'])]
# ['AER', 'ASF', 'CEK', 'DME'] are airlines from the list above
aa_merged['Airline'] = 'L_' + aa_merged.Airline.map(str)
aa_merged['Airport'] = 'P_' + aa_merged.Airport.map(str)
airport_nodes['Airport'] = 'P_' + airport_nodes.Airport.map(str)
aa_merged.head()
First we will create a full bipartite network with both nodes.
B = nx.Graph()
B.add_nodes_from(aa_merged['Airline'], bipartite=0)
B.add_nodes_from(aa_merged['Airport'], bipartite=1)
B.add_weighted_edges_from([tuple(d) for d in aa_merged[['Airline','Airport','weight']].values])
nx.set_node_attributes(B, airport_nodes.set_index('Airport').to_dict('index'))
Let's double check if our attributes were saved in the graph data.
list(B.nodes(data=True))[:3]
list(B.nodes(data=True))[-2:]
B.get_edge_data('L_AER','P_KZN')
print(nx.info(B))
airline_nodes = {n for n, d in B.nodes(data=True) if d['bipartite']==0}
airport_nodes = set(B) - airline_nodes
nx.is_connected(B)
bi.is_bipartite(B)
print(bi.density(B, airline_nodes))
print(bi.density(B, airport_nodes))
The functions below will use the edge weights to trim the graph into subgraphs at different minimum weight levels. Any node not connected by at least one ede with the minimum weight will be cut from the graph. We used the textbook functions as a starter, but had to revise them to get them to work...
'''
# textbook function doesn't work...
def trim_edges(g, weight=1):
g2=nx.Graph()
for f, to, edata in g.edges(data=True):
if edata['weight'] > weight:
g2.add_edge(f, to, edata)
return g2
'''
def trim_edges(g, weight=1):
g2=nx.Graph()
my_list=[]
my_list1=[]
for f, to, edata in g.edges(data=True):
if edata['weight'] > weight:
my_list.append(f)
my_list1.append(to)
g2.add_edge(f,to,attr_dict={weight:edata['weight']})
g2.add_nodes_from(my_list, bipartite=0)
g2.add_nodes_from(my_list1, bipartite=1)
return g2
def island_method(g, iterations=5, weight=1):
weights = [edata['weight'] for f, to, edata in g.edges(data=True)]
mn=int(min(weights)) if int(min(weights)) > weight else weight
mx=int(max(weights))
#compute the size of the step, so we get a reasonable step in iterations
step=int((mx-mn)/(iterations-1))
return [[threshold, trim_edges(g, threshold)] for threshold in range(mn,mx,step)]
islands = island_method(B, 6, 200)
print('min weight - ', '# of nodes - ', '# of island subgraphs')
for i in islands:
# print the threshold level, size of the graph, and number of connected components
print(i[0], ' - ', len(i[1]), ' - ', len(list(nx.connected_component_subgraphs(i[1]))))
islands[1][0] # the first index refers to the threshold
islands[1][1] # the second index refers to the graph object
def set_colors(G):
colors = []
for node, data in G.nodes(data=True):
if data['bipartite'] == 1:
colors.append('olivedrab') # Airports in Blue
else:
colors.append('cornflowerblue') # Airlines in pink
return colors
G0=max(nx.connected_component_subgraphs(islands[0][1]), key=len)
plt.rcParams["figure.figsize"] = (15,15) # set plot size
colors = set_colors(G0) # set colors
#weights = [math.log(edata['attr_dict'][200]) for f, t, edata in G0.edges(data=True)] # set weights
nx.draw(G0, with_labels=True, node_color=colors, node_size=200,
font_size=10, font_weight='bold', edge_color="skyblue", alpha=0.5)
We can start to see already how many airports are connected by a much smaller number of airlines (as we would expect). We can also see how airlnes serve as boundary spanners between airports.Let's raise the water level though to see if we can gain more insight...
G1=islands[1][1]
plt.rcParams["figure.figsize"] = (15,15) # set plot size
colors = set_colors(G1) # set colors
nx.draw(G1, with_labels=True, node_color=colors, node_size=200,
font_size=10, font_weight='bold', edge_color="skyblue", alpha=0.5)
Major airlines are starting to be visible with the airports that they connect, but a higher water level may make it easier to pick out individual airlines and airports.
G2=islands[2][1]
plt.rcParams["figure.figsize"] = (15,15) # set plot size
colors = set_colors(G2) # set colors
nx.draw(G2, with_labels=True, node_color=colors, node_size=200,
font_size=10, font_weight='bold', edge_color="skyblue", alpha=0.5)
Here we can really start to see the major airlines in the system: PEK, FAR ORD, ATL, LAX, LHR etc...
G3=islands[3][1]
plt.rcParams["figure.figsize"] = (15,15) # set plot size
colors = set_colors(G3) # set colors
nx.draw(G3, with_labels=True, node_color=colors, node_size=200,
font_size=10, font_weight='bold', edge_color="skyblue", alpha=0.5)
# compute an affiliation network of the Airports
airports = bi.weighted_projected_graph(B, airport_nodes)
# Find the largest connected subgraph in the network
#users_subgraph = nx.connected_component_subgraphs(users)[0] # textbook code doesn't work
airports_subgraph = max(nx.connected_component_subgraphs(airports), key=len) # alternate method
airports_subgraph.name = "Airports"
print(nx.info(airports_subgraph))
airport_islands = island_method(airports_subgraph, 3, 50)
print('min weight - ', '# of nodes - ', '# of island subgraphs')
for i in airport_islands:
# print the threshold level, size of the graph, and number of connected components
print(i[0], ' - ', len(i[1]), ' - ', len(list(nx.connected_component_subgraphs(i[1]))))
APG0=max(nx.connected_component_subgraphs(airport_islands[0][1]), key=len)
# set plot size
plt.rcParams["figure.figsize"] = (15,15)
nx.draw(APG0, with_labels=True, node_color='olivedrab', node_size=200,
font_size=10, font_weight='bold', edge_color="olivedrab", alpha=0.5)
APG1a=airport_islands[1][1]
# set plot size
plt.rcParams["figure.figsize"] = (10,10)
nx.draw(APG1a, with_labels=True, node_color='olivedrab', node_size=200,
font_size=10, font_weight='bold', edge_color="olivedrab", alpha=0.5)
APG1b=max(nx.connected_component_subgraphs(airport_islands[1][1]), key=len)
# set plot size
plt.rcParams["figure.figsize"] = (8,8)
nx.draw(APG1b, with_labels=True, node_color='olivedrab', node_size=200,
font_size=10, font_weight='bold', edge_color="olivedrab", alpha=0.5)
# compute an affiliation network of the Airlines
airlines = bi.weighted_projected_graph(B, airline_nodes)
# Find the largest connected subgraph in the network
#users_subgraph = nx.connected_component_subgraphs(users)[0] # textbook code doesn't work
airlines_subgraph = max(nx.connected_component_subgraphs(airlines), key=len) # alternate method
airlines_subgraph.name = "Airlines"
print(nx.info(airlines_subgraph))
airline_islands=island_method(airlines_subgraph, 3, 50)
print('min weight - ', '# of nodes - ', '# of island subgraphs')
for i in airline_islands:
# print the threshold level, size of the graph, and number of connected components
print(i[0], ' - ', len(i[1]), ' - ', len(list(nx.connected_component_subgraphs(i[1]))))
ALG0=max(nx.connected_component_subgraphs(airline_islands[0][1]), key=len)
# set plot size
plt.rcParams["figure.figsize"] = (15,15)
nx.draw(ALG0, with_labels=True, node_color='cornflowerblue', node_size=200,
font_size=10, font_weight='bold', edge_color="skyblue", alpha=0.5)
ALG1a=airline_islands[1][1]
# set plot size
plt.rcParams["figure.figsize"] = (10,10)
nx.draw(ALG1a, with_labels=True, node_color='cornflowerblue', node_size=200,
font_size=10, font_weight='bold', edge_color="skyblue", alpha=0.5)
ALG1b=max(nx.connected_component_subgraphs(airline_islands[1][1]), key=len)
# set plot size
plt.rcParams["figure.figsize"] = (8,8)
nx.draw(ALG1b, with_labels=True, node_color='cornflowerblue', node_size=200,
font_size=10, font_weight='bold', edge_color="skyblue", alpha=0.5)